SDA 3.5 Documentation for RECODE

NAME

recode - recode variables

USAGE

recode -b filename

DESCRIPTION

RECODE uses one or more existing variables as input to create a new SDA variable.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser. Users who run this program interactively should see the online help document.

It is also possible to run the program directly by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the ‘-b’ option flag.

BATCH FILE LAYOUT

The batch file is laid out in separate parts, separated by asterisks (*). The parts can be given in any order.

Definitions of the input and output variables.
Rules or "map" for recoding the input variables into the new output variable.
Category labels for the new output variable (optional).
Descriptive text for the new variable (optional).

Since the "map," category labels, and descriptive text can have varying numbers of lines, each of those parts ends with an asterisk (*) on a line by itself. The general layout is as follows:

     (Input and output definitions)

     MAP=
     (Recode map)
     *

     CATLABELS=       [optional]
     (Category text and labels)
     *

     TEXT=            [optional]
     (Descriptive text)
     *

KEYWORDS FOR RECODE SPECIFICATIONS

The specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, and the valid keywords are as follows (with significant characters shown in capital letters):

Defining Input Variables


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


STudies=      path of source dataset(s)       Look for input variables
                                               only in current directory

INvars=       name(s) of input var(s)         REQUIRED

Defining the New Variable


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


OUTSTudy=     path of study for new variable  Current directory

OUTVar=       name of new variable            REQUIRED

LABEL=        long label for new variable     No long label

CATlabels=    (precedes lines of category     No category text
                text - see details below)      or labels

MAP=          (precedes lines with recode     REQUIRED
               map or rules - see below)

MD=           list of invalid codes, ranges   No defined MD codes
              (also used for output value
               if input has missing data
               -- see below)

MIN=          minimum valid code              No defined minimum

MAX=          maximum valid code              No defined maximum

OVERwrite=    yes                             Do not overwrite new var
                                                if it already exists

OTHercases=   name of the input variable      Set to MD code
               from which to take the value    (or system-missing)
               for cases that do not match
               a pattern in the MAP

TEXT=         (precedes lines of descriptive  No item text
                text - see details below)

Other options


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DIAGnostics=  yes                             No diagnostic summary of
                                                the new variable

COLorcoding=  yes                             No colored headings in the
                                                diagnostic output

GVARCase=     LOWER or UPPER                  Do not convert all variable
                                                names to lower/upper case

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

SAVebatch=    name of directory               No file preserved with batch
                                                commands to create new var
                                                (for interactive version)
                                                The batch file name is the
                                                name of the new variable,
                                                with the suffix ’.rec’

ABBREVIATIONS AND REPETITIONS

Most keywords can be abbreviated. Usually only two or three characters are required. The keyword for the category text for the new variable, for instance, can be given as "catlabels=" or "catlab=" or even "cat=". Either upper or lower case may be used. If keywords are repeated, the second specification will override the first.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.

RECODE MAP

The rules for combining the values of one or more input variables into a value on the output variable are contained in the recode map. First put the MAP keyword on a line by itself; then put each recode rule on a separate line. The general format is as follows: New value: values on var 1 [; values on var 2; ... ] The recode rules for different input variables are separated by a semicolon (;). After the last rule, put an asterisk (*) on a line by itself. For example, to recode age and gender into 4 categories (younger male, younger female, older male, older female), one could construct the following recode map:

     map=
     1: 18-49; 1
     2: 18-49; 2
     3: 50-97; 1
     4: 50-97; 2
     *

Each recode rule can include more than one value or range for each input variable. A single asterisk (*) in a recode rule matches any VALID value of the corresponding input variable. Two asterisks (**) match ANY value, including missing-data (both user-defined and system-missing) and out-of-range values. It is possible to have more than one rule for a given output value -- notice that the output code 4 has three rules in the example given below.

     map=
     1: 1,3-5,7 ; 1-10
     2: 1,3-5,7 ; 11-50
     3: 1,3-5,7 ; 51-90
     4: 8-10,12 ; *
     4: 41,45,55; 11-90
     4: 61-90   ; *
     9:    **   ; **
     *

If a case matches more than one recode rule, the first rule encountered will apply. Notice in this example that the recode rule ‘**; **’ matches all values of the two input variables; any cases not covered by a rule higher up in the map will receive the value 9.

CASES UNMATCHED BY THE RECODE MAP

If a case does not match any of the recode rules the output variable can take on one of several values, depending on the options that were specified.

If the ‘OTHercases=’ keyword was specified, that case will be assigned the value of the variable specified after that keyword.
If the ‘OTHercases=’ keyword was NOT specified, the case will be assigned the value specified with the ‘MD=’ keyword. If more than one MD value was specified, the first MD value is used for this purpose. Note that all values mentioned after the ‘MD=’ keyword are flagged as missing-data in the new variable.
If neither the ‘OTHercases=’ keyword nor the ‘MD=’ keyword has been specified, that case will be assigned the system-missing value.

CATEGORY TEXT AND LABELS

Category text and labels for one or more codes of the new variable can be supplied. First put the ‘CATlabels=’ keyword on a line by itself; then specify on a separate line each code, followed by one or more spaces or tabs, then the category text [and short label, if desired]. (Programs such as TABLES and MEANS will use the short label for a category, if one is available.) Put an asterisk (*) on a line by itself after the last label. For example:

     catlabels=
     1 Professional and technical [Prf,Tech]
     2 Managers
     3 Blue collar workers [Blue Col]
     4 Other
     9 Missing
     *

CHARACTER INPUT VALUES

Recode only works with NUMERIC variables, but it can handle character values that have been defined as missing-data codes (such as ‘D’ or ‘R’). One of the examples below illustrates this application.

DESCRIPTIVE TEXT

Descriptive text may be stored with the new variable. This text can then be displayed when the variable is used in analysis programs or in a codebook. First put the ‘TEXT=’ keyword on a line by itself; then write as many lines of text as you wish to store with the new variable. Put an asterisk (*) on a line by itself after the last line of text.

MULTIPLE RECODES

RECODE commands for more than one variable can be included in the same batch file. After the first set of commands, put a line beginning with two asterisks (**); then the commands for another new variable can follow. The value of the ‘STudies=’ keyword is carried over from the previous set of commands, unless it is respecified.

BACKWARD COMPATIBILITY

RECODE can read most older CSA recode commands. The following keywords are still recognized and are equivalent to the new keywords shown in parentheses:

longlabel (label)
labels (catlabels),

The missing-data keywords ‘md1=value1’ and ‘md2=value2’ are also recognized and are equivalent to the new form: ‘md= value1, value2’.

Note, however, that in the CSA recode rules, a single asterisk (*) matches ALL values of an input variable. SDA distinguishes between a single asterisk, which matches only the VALID values of an input variable; and two asterisks, which match ALL values.

EXAMPLES OF BATCH FILES

1. Collapse age into 3 categories


study = /sda/testdata
invar = age
outvar = age3
label = Collapsed age - 3 categories
md = 9

map=
1: 18-29
2: 30-49
3: 50-97
*

catlabels=
1 <30
2 30-49
3 50+
9 missing
*

**

2. Recode age and gender into 4 categories


invars = age gender
outvar = agesex
label = Age-gender typology
overwrite = yes
md = 9

map=
1: 18-49; 1
2: 18-49; 2
3: 50-97; 1
4: 50-97; 2
*

catlabels=
1 Yng Male
2 Yng Feml
3 Old Male
4 Old Feml
9 Missing
*

text=
This variable is a four-category typology
of age and gender
*

**

3. Collapse highest and lowest values of age


study = /sda/testdata
invar = age
outvar = age2070
label = Collapsed age - 20-70

# Note the use of the ‘othercases=’ option;
#  only the codes given in the map are changed.
othercases = age

# We want the previous MD codes of 99 to stay as MD
md = 99

map=
20: 1-20
70: 70-97
*

catlabels=
20 20 or younger
70 70 or older
*

**

4. Convert character missing data codes to numbers


invar = spend
outvar = numspend
label = Recoded spend variable
md = 8,9

map=
1: 1-2
2: 3
8: D
9: R
*

catlabels=
1 A lot
2 Not enough
8 Don’t know
9 Refused
*

**

CSM, UC Berkeley
April 12, 2011